Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path#374
Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path#374
Conversation
Overall AssessmentExecutive Summary
Methodology
Code Quality
PerformanceNo findings in the changed diff. MaintainabilityNo findings beyond the contract bug above. Tech DebtNo mitigating SecurityNo findings in the changed diff. Documentation/Tests
Path to Approval
|
…wording
P1: `path_sup_t_bands` now returns `{}` (not `None`) when `by_path +
n_bootstrap > 0` is requested but `path_effects == {}` (no observed
path has a complete window). Mirrors the documented empty-state
contract used by `path_effects` and `path_placebo_event_study`.
The bug: `_collect_path_bootstrap_inputs` is gated on
`len(path_effects) > 0`, so when `path_effects == {}` the bootstrap
collector is skipped, `bootstrap_results.path_cband_crit_values` stays
`None`, and the result-class kwarg builder mapped `None` to
`path_sup_t_bands = None` — losing the requested-vs-empty distinction.
Fix: the kwarg builder now keys off `self.by_path is not None and
self.n_bootstrap > 0`. When that condition holds we always materialize
a dict (empty when no path passed both gates), reserving `None` only
for "feature not requested." When `path_cband_crit_values is None` we
treat it as the empty case for the dict comprehension.
P3: replaced "Hall-Mammen multiplier weight matrix" with neutral
"multiplier weight matrix (using the estimator's configured
`bootstrap_weights` — Rademacher / Mammen / Webb)" in CHANGELOG,
REGISTRY, helper docstring, and TestByPathSupTBands docstring. The
implementation honors `self.bootstrap_weights`, so the prose
shouldn't fix the family.
Regression test added: `test_path_sup_t_bands_empty_dict_when_no_complete_window`
on the same empty-window panel as the analytical sibling at
`test_empty_path_surface_when_no_complete_window` (`:4015+`),
asserting `path_effects == {}`, `path_sup_t_bands == {}`, and no
horizon writes a `cband_conf_int` key.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Code gate is `finite_mask.sum() > 0.5 * n_bootstrap` (strict majority,
mirrors OVERALL `:612`). Prose said `>=50%` in 8 places; aligned to
"strict majority (more than 50%)" / `>50%`:
- `CHANGELOG.md` Unreleased entry
- `docs/methodology/REGISTRY.md` per-path joint sup-t Note (added
explicit "Exactly half-finite draws are NOT enough" clarification)
- `diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py` helper comment
- `diff_diff/chaisemartin_dhaultfoeuille_results.py`:
- `DCDHBootstrapResults.path_cband_crit_values` field comment
- `ChaisemartinDHaultfoeuilleResults.sup_t_bands` Attributes (the
OVERALL surface — pre-audit fix updated this in R0; my new prose
introduced the >=50% wording so I'm correcting it here too)
- `ChaisemartinDHaultfoeuilleResults.path_sup_t_bands` Attributes
- `path_sup_t_bands` field comment
- `tests/test_chaisemartin_dhaultfoeuille.py`:
- `TestByPathSupTBands.test_path_sup_t_absent_path_has_no_cband_keys`
docstring
Boundary regression added per reviewer ask:
`test_path_sup_t_strict_majority_gate_at_exact_50pct` monkeypatches
`_generate_psu_or_group_weights` to inject NaN into exactly half of
the bootstrap rows. With `n_bootstrap=4` and 2 NaN rows,
`finite_mask.sum() == 2 == 0.5 * 4`, so the gate `2 > 2.0` is False.
Asserts both `path_sup_t_bands == {}` and `sup_t_bands is None`, plus
no horizon writes `cband_conf_int` (per-path or OVERALL). Pins the
strict-majority semantics on both sup-t surfaces.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
No other unmitigated methodology findings. The Python-only extension itself, the fresh shared-draw asymmetry, and the inherited cross-path SE deviation are documented, so they are not defects under the stated rubric. Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are now computed across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all valid horizons, producing correlated bootstrap distributions. The path-specific critical value `c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as `cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top level as `results.path_sup_t_bands[path]`. Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation, #360 R-parity, #364 bootstrap, #371 placebos). **Methodology asymmetry vs OVERALL** (intentional, documented): per-path sup-t draws fresh shared weights AFTER the per-path SE bootstrap block has populated `path_ses` via independent per-(path, horizon) draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Preserves RNG-state isolation for existing per-path SE seed-reproducibility tests. **Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Otherwise the path is absent from `path_sup_t_bands`. **Empty-state contract**: `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates (covers two cases: `path_effects == {}` upstream OR all paths fail gates downstream). **Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t bands at any surface — Python-only methodology extension consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Bundled pre-audit fix** (sibling-surface check): the existing OVERALL `sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to the actual contract description. Tests: new `TestByPathSupTBands` class with 13 tests covering: attr None when no bootstrap / no by_path; keys match `path_effects` with finite crit; band wider than pointwise; crit finite and positive; seed reproducibility; single-horizon-path-skip; L_max=1 skip; n_valid_horizons matches; absent-path-no-cband-keys; summary renders; empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct (monkeypatches the weight generator to inject NaN into half the bootstrap rows, asserting both `sup_t_bands is None` and `path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b1737f3 to
c0c0d4e
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. The new per-path sup-t surface matches the documented contract in Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. I did not find a new untracked correctness limitation that would need Security No findings. Documentation/Tests No findings from the diff. The new tests are well targeted for the main risk areas here: requested-but-empty output, absent-key behavior for invalid bands, seed reproducibility, summary rendering, and the exact-50% boundary in |
Cross-surface gap caught in self-audit: OVERALL `to_dataframe(level= "event_study")` includes `cband_lower` / `cband_upper` columns (`chaisemartin_dhaultfoeuille_results.py:1495-1496,1531-1532`) but the per-path table at `level="by_path"` does not — even though per-path now produces `cband_conf_int` writes via the new sup-t propagation block. Cross-surface twin asymmetry the CI reviewer didn't flag; caught by my own grep audit on `cband_conf_int` consumers. Fix: extend `to_dataframe(level="by_path")` to emit the same two columns. Populated for positive-horizon rows of paths with a finite sup-t crit (read from `path_effects[path]["horizons"][l] ["cband_conf_int"]`); NaN for placebo rows (no joint band per the positive-only sup-t spec), unbanded paths, and the requested-but-empty fallback DataFrame (which now includes the columns in its canonical schema). Tests added: - `test_path_sup_t_to_dataframe_emits_cband_columns` — column presence + per-row alignment with the dict surface - `test_path_sup_t_to_dataframe_empty_path_fallback_has_cband_columns` — empty-path fallback DataFrame schema parity Docs updated: - REGISTRY.md: `to_dataframe(level="by_path")` integration note added to the new sup-t Note; canonical column list in the existing `Note (Phase 3 by_path ...)` block extended with `cband_lower / cband_upper` - CHANGELOG entry: surface listing now mentions to_dataframe columns - `by_path` parameter docstring: rendering surface listing extended - `path_sup_t_bands` Attributes docstring: rendering surface listing extended Suite: 263 tests pass (was 261, +2 new tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. I did not see a new deferred item that needs Security No findings. Documentation/Tests
Execution note: local test execution was not possible here because |
…erage P3 #1: ``to_dataframe`` method docstring at ``chaisemartin_dhaultfoeuille_results.py:1375-1379`` listed the pre-change ``level="by_path"`` schema (no ``cband_*`` columns) even though the implementation now returns them. Updated the bullet to include ``cband_lower / cband_upper``, document the negative-horizon placebo convention, and document the NaN-on-absent-band behavior. P3 #2: ``TestByPathSupTBands::test_path_sup_t_seed_reproducibility`` only exercised the default ``rademacher`` weight family. Parameterized over ``["rademacher", "mammen", "webb"]`` to pin that the per-path sup-t branch correctly threads ``self.bootstrap_weights`` through ``_generate_psu_or_group_weights`` for all three multiplier families the feature advertises. The existing OVERALL machinery handles all three uniformly, but the per-path surface lacked direct coverage. Each variant must produce a finite, reproducible crit on the standard 3-path fixture. 17 tests pass on TestByPathSupTBands (was 15: +2 new parameterized variants on the existing seed_reproducibility test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins design-variant extensions; this PR is item #5: combine by_path=k with controls=[...] (DID^X). Architecture: the per-baseline OLS residualization at chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized Y_mat automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing controls + per-period DID contract. Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls). On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool and the residualization coefficients coincide. Per-path point estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5% (Phase 2 envelope, inheriting the documented cross-path cohort- sharing deviation). Changes: - Delete the gate at chaisemartin_dhaultfoeuille.py:988-992 - Update by_path docstring (remove `controls` from incompatible list, add inheritance paragraph) - New R parity scenario `multi_path_reversible_by_path_controls` in benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden values - New TestDCDHDynRParityByPathControls in tests/test_chaisemartin_dhaultfoeuille_parity.py - New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py (12 tests covering analytical / bootstrap / placebo / sup-t / cband to_dataframe / per-period unadjusted / covariate_residuals round- trip / multi-covariate) - Remove the `controls` parametrize entry from TestByPathGates::test_forbids_phase3_fit_kwargs - Update REGISTRY.md (remove `controls` from gated-combos list, add inheritance sub-paragraph documenting the four-surface auto- inheritance) - CHANGELOG: Unreleased > Added entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
1..L_maxwithin each path whenn_bootstrap > 0is set withby_path=k. A single shared(n_bootstrap, n_eligible)Hall-Mammen multiplier weight matrix is drawn per path and broadcast across all valid horizons; the path-specific critical valuec_p = quantile(max_l |t_l|, 1-α)is applied per horizon ascband_conf_intand surfaced at top level asresults.path_sup_t_bands[path].by_pathfollow-up sequence (dCDH: addby_pathper-path event-study disaggregation #357 foundation, dCDH by_path R-parity fixtures + TestDCDHDynRParityByPath #360 R-parity, dCDH by_path + n_bootstrap support (library-consistent percentile CI) #364 bootstrap, dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3) #371 placebos).sup_t_bandsfield replaced with the actual contract description.Methodology
Closely parallels the OVERALL
event_study_sup_t_bandspattern atchaisemartin_dhaultfoeuille_bootstrap.py:599-614, with one documented intentional asymmetry: per-path sup-t draws a fresh shared-weights matrix per path AFTER the per-path SE bootstrap block has already populatedresults.path_sesvia independent per-(path, horizon) draws. Numerator uses fresh shared draws; denominator uses bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical — the fresh draw preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable.Gates (mirror OVERALL
:605,612): a path needs>= 2valid horizons (finite bootstrap SE > 0) AND>= 50%finite sup-t draws to receive a band; otherwise the path is absent frompath_sup_t_bands(NaN-on-invalid absent-key contract).Empty-state:
path_sup_t_bands is Nonewhen not requested (no bootstrap orby_path is None);{}when requested but no path passes both gates.Interpretation: bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths (different inference target, deferred).
Deviation from R
did_multiplegt_dynprovides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALLevent_study_sup_t_bands(also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented forpath_effects(the bootstrap SE used in the t-stat denominator carries the same deviation).Test plan
pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands -v -m slow— 11 new tests passpytest tests/test_chaisemartin_dhaultfoeuille.py -k "ByPath" -m "slow or not slow"— 71 passed (regression on existing per-path tests)pytest tests/test_chaisemartin_dhaultfoeuille.py -k "sup_t or cband"— OVERALL sup-t bit-identicalpytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_methodology_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py -m "slow or not slow"— 259 passedblack+ruffclean on changed source filesmulti_path_reversible_by_path(n_bootstrap=200, seed=42, L_max=3) shows per-path crits in expected range[1.96, 2.394](between marginal and Bonferroni)🤖 Generated with Claude Code